Indexing and Querying Content and Structure of Xml Documents According to the Vector Space Model
نویسنده
چکیده
This paper presents a method to index and query content and structure of XML documents according to the vector space model. Indexing is performed in three steps: (i) choosing content elements i.e. those which refer to the semantic content of the documents, (ii) associating a vector to each terminal content element whose components are the weights associated to each indexing term, (iii) propagating these vectors bottom up along the ancestors of the terminal content elements. Querying is performed with XQuery extended by adding to it a vscore function which returns the similarity degree between a query vector and a content element vector and by integrating into it the NEXI language which is a subset of XPath, defined in the framework of the INEX initiative that we have equipped with a fuzzy semantics.
منابع مشابه
Querying and Ranking XML Documents Based on Data Synopses
There is an increasing interest in recent years for querying and ranking XML documents. In this paper, we present a new framework for querying and ranking schema-less XML documents based on concise summaries of their structural and textual content. We introduce a novel data synopsis structure to summarize the textual content of an XML document for efficient indexing. More importantly, we extend...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملSearching XML Documents - Preliminary Work
Structured document retrieval aims at exploiting the structure together with the content of documents to improve retrieval results. Several aspects of traditional information retrieval applied on flat documents have to be reconsidered. These include in particular, document representation, storage, indexing, retrieval, and ranking. This paper outlines the architecture of our system and the adapt...
متن کاملA Generic Framework for Querying and Updating Secondary XML Index Structures
To cope with the increasing number and size of XML documents, XML databases provide index structures to accelerate queries on the content and structure of documents. To adapt indices to the query workload, XML databases require various secondary index structures. This paper presents a generic index framework called sciens (Structure and Content Indexing with Extensible, Nestable Structures). In...
متن کاملSemantic Video Annotation and Vague Query
The Digital Video Album (DVA) system described here integrates various cooperating subsystems to index and query video documents according to their semantic content and other metadata. A simple structured model is proposed to represent the video content. This model is compatible with XML Schemas and supports typed attributes and composition relationships. The architecture of DVA is described, a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005